Amber(G46 and G47 Analysis)

Row

Row

Chart A

  • Most of the residents achieved the level 7, which refers to the bachelor degree, and there are almost twice as many female as male.

  • Majority of male residents achieved at the level 3 and 4.

Education level count by gender

Education level count by gender

Row

Chart B

  • Management & Commerce is the field that the most population have studied.

Row

Chart C

Population distribution of field

Population distribution of field

  • Not stated seems to dominate the map, however, population is concentrated in the green region.

Row

Chart D

Best field of each region

Best field of each region

  • Management & commerce is the first place not only in region 206 but also the whole SA4, while the second place is engineering and technologies.

  • Mixed field programmes seems to be the least popular field in SA4.

Adarsh (G52 Analysis)

Row

Row

Chart A

  • It can be observed from both figures that overall females worked more than men. However, as the number of work-hours increased men have worked more than women.

Row

Chart B

  • It can be observed from figure that industries like health care, education and training, construction and Professional and technical services have more working population as the working hours increased. Mining, electricity, gas, water showed low working population irrespective of work hours.

Adarsh(G58 Analysis)

Row

Chart C

  • It can be observed from figure that overall females worked more than men at all occupations. Although, for maximum hours worked, as number of working-hours increased, the number of men and women remained the same.

Row

Chart D

  • It can be observed from figure that the most number of employees in the SA4 regions are employed in the occupations of Professionals, Managers and Technicians and trade workers. Professionals accounted for highest number of employees for region 206, while machinery operators and drivers accounted for the least number of employees for region 213 respectively.

---
title: "ETC5513 Assignment4 -Team StarWars"
output: 
  flexdashboard::flex_dashboard:
    orientation: columns
    vertical_layout: fill
    navbar:
      - { title: "About", href: "https://github.com/mohammedfaizan0014/etc5513-assignment-4-star-wars/blob/main/README.md", align: left }
    social: [ "twitter", "facebook", "menu" ]
    source_code: embed
---

```{r echo=FALSE, include=FALSE}
knitr::opts_chunk$set(fig.path = "Figures/", fig.align ="center",
                      out.width = "50%", echo = FALSE, 
                      messages = FALSE, 
                      warning = FALSE)
# Loading Libraries
library(tidyverse)
library(readr)
library(kableExtra)
library(tinytex)
library(bookdown)
library(naniar)
library(visdat)
library(citation)
library(knitr)
library(scales)
library(patchwork)
library(sf)
library(glue)
library(unglue)
library(sugarbag)
library(readxl)
library(plotly)
library(tidytext)
library(ggplot2)
```


```{r writing_packages_bibliographies}
knitr::write_bib(c(.packages()), "packages.bib")
```


```{r}
celldiscriptors <- read_excel(here::here("data/australian_census_data_2016/Metadata/Metadata_2016_GCP_DataPack.xlsx"), sheet=2, skip=10)
```


```{r}
data_path <- here::here("data/australian_census_data_2016/")
```


```{r}
data_path <- here::here("data/australian_census_data_2016/")
census_paths <- glue::glue(data_path, "/2016 Census GCP All Geographies for VIC/SA4/VIC/2016Census_G{number}{alpha}_VIC_SA4.csv", 
                         number = c("46","46","47","47","47","51","51","51","51","57","57", "52", "52", "52", "52", "58", "58"), alpha = c("A","B","A","B","C","A","B","C","D","A","B", "A","B","C","D", "A","B"))
```
```{r geopath, include=FALSE}
geopath <- glue::glue(data_path, "/2016_SA4_shape/SA4_2016_AUST.shp")
sa4_codes<- read_csv(census_paths[2]) %>% 
                mutate(SA4_CODE_2016 = as.character(SA4_CODE_2016)) %>% 
                select(SA4_CODE_2016)
sa4_geomap <- read_sf(geopath) %>%
  right_join(sa4_codes, by=c("SA4_CODE16" = "SA4_CODE_2016"))
```


Amber(G46 and G47 Analysis) {data-icon="G46 and G47"}
=============================
Row {data-width=150}
--------------------------------------

```{r g46read, include=FALSE}
g46a<- read_csv(census_paths[1]) %>%
  select(-starts_with("P"), -contains("Tot"), -contains("nfd"), -contains("IDes")) %>%
            mutate(SA4_CODE_2016 = as.character(SA4_CODE_2016)) %>% 
  pivot_longer(cols = -c(SA4_CODE_2016),
                  names_to = "category",
                  values_to = "count") %>%
  unglue_unnest(category, 
                    c("{sex=[MF]}_{educationlevel=GradDip_and_GradCert}_{age_min=\\d+}_{age_max=\\d+}",
                      "{sex=[MF]}_{educationlevel=PGrad_Deg}_{age_min=\\d+}_{age_max=\\d+}",
                      "{sex=[MF]}_{educationlevel=BachDeg}_{age_min=\\d+}_{age_max=\\d+}",
                      "{sex=[MF]}_{educationlevel=AdvDip_and_Dip}_{age_min=\\d+}_{age_max=\\d+}",
                      "{sex=[MF]}_{educationlevel=Cert_III_IV}_{age_min=\\d+}_{age_max=\\d+}",
                      "{sex=[MF]}_{educationlevel=Cert_I_II}_{age_min=\\d+}_{age_max=\\d+}",
                      "{sex=[MF]}_{educationlevel=Lev_Edu_NS}_{age_min=\\d+}_{age_max=\\d+}",
                      "{sex=[MF]}_{educationlevel=Lev_Edu_NS|GradDip_and_GradCert|PGrad_Deg|BachDeg|AdvDip_and_Dip|Cert_III_IV|Cert_I_II}_{age_min=\\d+}ov"
                      
                      ),
                remove = FALSE) %>% 
  select(-category)
  
```

```{r}
g46a <- g46a %>% 
  mutate(afq_level =case_when(str_detect(educationlevel, "GradDip_and_GradCert") ~ "Level 8",
                            str_detect(educationlevel, "PGrad") ~ "Level 9",
                            str_detect(educationlevel, "BachDeg") ~ "Level 7",
                            str_detect(educationlevel, "AdvDip_and_Dip") ~ "Level 5 & 6",
                            str_detect(educationlevel, "Cert_III_IV") ~ "Level 3 & 4",
                            str_detect(educationlevel, "Cert_I_II") ~ "Level 1 & 2",
                            str_detect(educationlevel, "Cert_Levl_nfd") ~ "Level 3 & 4",
                            str_detect(educationlevel, "Lev_Edu_IDes") ~ "Level Inadequately Described",
                            str_detect(educationlevel, "Lev_Edu_NS") ~ "Not Stated",
                            TRUE ~ educationlevel)) %>% 
  rename(count_edu_lvl = count)
```


Row {data-height=600}
-----------------------------------------------------------------------
### Chart A

- Most of the residents achieved the level 7, which refers to the bachelor degree, and there are almost twice as many female as male.

- Majority of male residents achieved at the level 3 and 4.

```{r edu_gender, fig.cap = "Education level count by gender"}
ggplot(g46a, aes(x = reorder(afq_level,count_edu_lvl),
                      y = count_edu_lvl,
                      fill = sex)) +
         geom_col() +
         labs(x = "AQF levels",
              y = "number of observations",
              title = "Education level by gender") +
         scale_y_continuous(label=label_number())+
         coord_flip()
```


Row {data-height=600}
-----------------------------------------------------------------------
### Chart B

```{r}
g47 <- map_dfr(census_paths[3:4], ~{
  df <- read_csv(.x) %>%
      select(-starts_with("P"), -contains("Tot"), -contains("InadDes")) %>%
            mutate(SA4_CODE_2016 = as.character(SA4_CODE_2016)) %>% 
  pivot_longer(cols = -c(SA4_CODE_2016),
                  names_to = "category",
                  values_to = "count") %>%
  unglue_unnest(category, 
                    c("{sex=[MF]}_{field=(Mgnt_Com|Society_Cult|Fd_Hosp_Psnl_Svcs|MixFld_Prgm|FldStd_NS|NatPhyl_Scn|InfoTech|Eng_RelTec|ArchtBldng|Ag_Envir_Rltd_Sts|Health|Educ|Creative_Arts)}_{age_min=\\d+}_{age_max=\\d+}",
                      "{sex=[MF]}_{field=(Mgnt_Com|Society_Cult|Fd_Hosp_Psnl_Svcs|MixFld_Prgm|FldStd_NS|NatPhyl_Scn|InfoTech|Eng_RelTec|ArchtBldng|Ag_Envir_Rltd_Sts|Health|Educ|Creative_Arts)}_{age_min=\\d+}ov",
                      "{sex=[MF]}_{field=(Mgnt_Com|Society_Cult|Fd_Hosp_Psnl_Svcs|MixFld_Prgm|FldStd_NS|N{atPhyl_Scn|InfoTech|Eng_RelTec|ArchtBldng|Ag_Envir_Rltd_Sts|Health|Educ|Creative_Arts)}_{age_min=\\d+}_years_and_over"
                      
                      ),
                remove = FALSE)
})

```

```{r}
g47 <- g47 %>% 
  mutate(field =case_when(
    str_detect(field, "NatPhyl_Scn") ~ "Natural_and_Physical_Sciences",
                            str_detect(field, "InfoTech") ~ "Information_Technology",
                            str_detect(field, "Eng_RelTec") ~ "Engineering_and_Technologies",
                            str_detect(field, "ArchtBldng") ~ "Architecture_and_Building",
                            str_detect(field, "Ag_Envir_Rltd_Sts") ~ "Agriculture_Environment",
                            str_detect(field, "Health") ~ "Health",
                            str_detect(field, "Educ") ~ "Education",
                            str_detect(field, "Mgnt_Com") ~ "Management_and_Commerce",
                            str_detect(field, "Society_Cult") ~ "Society_and_Culture",
                            str_detect(field, "Creative_Arts") ~ "Creative_Arts",
                            str_detect(field, "Fd_Hosp_Psnl_Svcs") ~ "Food_Hospitality_and_Personal_Services",str_detect(field, "MixFld_Prgm") ~ "Mixed_Field_Programmes",
                            str_detect(field, "FldStd_NS") ~ "Not Stated",
                            TRUE ~ field))  %>%
  select(-category) %>%
  rename(count_field = count)

```


- Management & Commerce is the field that the most population have studied.

```{r}
ggplot(g47, aes(x = reorder(field,count_field),
                      y = count_field,
                      fill = sex)) +
         geom_col() +
         labs(x = "Field",
              y = "number of observations",
              title = "Field by gender") +
  scale_y_continuous(label=label_number()) +
  coord_flip()
```

Row{data-height=500}
-----------------------------------------------------------------------

### Chart C

```{r popareafield, fig.cap="Population distribution of field"}
popareafield <- g47 %>%
  group_by(SA4_CODE_2016, field) %>%
  summarise(count_fieldarea = sum(count_field)) %>%
  ungroup()

popareafieldmax <- popareafield %>% 
  select(1:3) %>%
  group_by(SA4_CODE_2016) %>%
  slice_max(count_fieldarea) %>%
  arrange(SA4_CODE_2016)

popareafieldmax %>% 
  full_join(sa4_geomap, 
            by = c("SA4_CODE_2016"="SA4_CODE")) %>%
  ggplot() +
  geom_sf(mapping = aes(geometry= geometry, fill=field)) +
  geom_sf_text(aes(geometry= geometry,label=field, colour="white"), check_overlap=TRUE)+
  theme_void() +
  theme(legend.position = "none")
```

- Not stated seems to dominate the map, however, population is concentrated in the green region.

Row{data-height=500}
-----------------------------------------------------------------------

### Chart D

```{r bestfield, fig.cap="Best field of each region"}
bestfield <- popareafield %>% 
  select(1:3) %>%
  group_by(field) %>%
  slice_max(count_fieldarea) %>%
  arrange(SA4_CODE_2016)

bestfield %>%
  ggplot() +
  geom_col(mapping = aes(x = reorder_within(field,count_fieldarea, SA4_CODE_2016), y = count_fieldarea, fill = field)) +
  labs(title = "Region and Best Field", 
       x = "Fields with region code",
       y = "Number of Students") +
  scale_y_continuous(label=label_number()) +
  coord_flip()  +
  theme(legend.position = "none")
```

- Management & commerce is the first place not only in region 206 but also the whole SA4, while the second place is engineering and technologies.

- Mixed field programmes seems to be the least popular field in SA4.


Adarsh (G52 Analysis) {data-icon="G52 and G58"}
=============================
Row {data-width=150}
--------------------------------------

```{r}
g52 <- map_dfr(census_paths[12:14], ~{
  df <- read_csv(.x) %>%
      select(-starts_with("P"), -contains("Tot")) %>%
            mutate(SA4_CODE_2016 = as.character(SA4_CODE_2016)) %>% 
  pivot_longer(cols = -c(SA4_CODE_2016),
                  names_to = "category",
                  values_to = "count") %>%
  unglue_unnest(category, 
                    c("{sex=[MF]}_{industry=(AgriForestFish|Min|Mnfg|EGW_WS|Cnstn|WTrade|RTrade|AccomFoodS|TransPostWhse|InfoMedTelecom|FinInsurS|RentHirREserv|ProScieTechServ|AdminSupServ|PubAdmiSafety|EducTrain|HealthCareSocA|ArtRecServ|OthServ|ID_NS)}_{hr_min=\\d+}_{hr_max=\\d+}",
                      "{sex=[MF]}_{industry=(AgriForestFish|Min|Mnfg|EGW_WS|Cnstn|WTrade|RTrade|AccomFoodS|TransPostWhse|InfoMedTelecom|FinInsurS|RentHirREserv|ProScieTechServ|AdminSupServ|PubAdmiSafety|EducTrain|HealthCareSocA|ArtRecServ|OthServ|ID_NS)}_{hr_min=\\d+}",
                      "{sex=[MF]}_{industry=(AgriForestFish|Min|Mnfg|EGW_WS|Cnstn|WTrade|RTrade|AccomFoodS|TransPostWhse|InfoMedTelecom|FinInsurS|RentHirREserv|ProScieTechServ|AdminSupServ|PubAdmiSafety|EducTrain|HealthCareSocA|ArtRecServ|OthServ|ID_NS)}_{hr_min=\\d+}over"
                     ),
                remove = FALSE)
})
```
```{r}
g52 <- g52 %>% 
  mutate(industry =case_when(
                            str_detect(industry, "AgriForestFish") ~ "Agriculture_forestry_and_fishing",
                            str_detect(industry, "Min") ~ "Mining",
                            str_detect(industry, "Mnfg") ~ "Manufacturing",
                            str_detect(industry, "EGW_WS") ~ "Electricity_gas_water_and_waste_service",
                            str_detect(industry, "Cnstn") ~ "Construction",
                            str_detect(industry, "WTrade") ~ "Wholesale_trade",
                            str_detect(industry, "RTrade") ~ "Retail_trade",
                            str_detect(industry, "AccomFoodS") ~ "Accommodation_and_food_services",
                            str_detect(industry, "TransPostWhse") ~ "Transport_postal_and_warehousing",
                            str_detect(industry, "InfoMedTelecom") ~ "Information_media_and_telecommunications",
                            str_detect(industry, "FinInsurS") ~ "Financial_and_insurance_services",
                            str_detect(industry, "RentHirREserv") ~ "Rental_hiring_and_real_estate_services",
                            str_detect(industry, "ProScieTechServ") ~ "Professional_scientific_and_technical_services",
                            str_detect(industry, "AdminSupServ") ~ "Administrative_and_support_services",
                            str_detect(industry, "PubAdmiSafety") ~ "Public_administration_and_safety",
                            str_detect(industry, "EducTrain") ~ "Education_and_training",
                            str_detect(industry, "HealthCareSocA") ~ "Health_care_and_social_assistance",
                            str_detect(industry, "ArtRecServ") ~ "Arts_and_recreation_services",
                            str_detect(industry, "OthServ") ~ "Other_services",
                            str_detect(industry, "ID_NS") ~ "Not Stated",
                            TRUE ~ industry))  %>%
  select(-category) %>%
  rename(count_industry = count)
```

```{r}
g58 <- map_dfr(census_paths[16], ~{
  df <- read_csv(.x) %>%
      select(-starts_with("P"), -contains("Tot")) %>%
            mutate(SA4_CODE_2016 = as.character(SA4_CODE_2016)) %>% 
  pivot_longer(cols = -c(SA4_CODE_2016),
                  names_to = "category",
                  values_to = "count") %>%
  unglue_unnest(category, 
                    c("{sex=[MF]}_{occupation=(Mng|Pro|TTW|CPS|CA|Sal|MOD|Lab|ID_NS|)}_{hrs_min=\\d+}_{hrs_max=\\d+}",
                      "{sex=[MF]}_{occupation=(Mng|Pro|TTW|CPS|CA|Sal|MOD|Lab|ID_NS|)}_{hrs_min=\\d+}",
                      "{sex=[MF]}_{occupation=(Mng|Pro|TTW|CPS|CA|Sal|MOD|Lab|ID_NS|)}_{hrs_min=\\d+}over"
                      ),  
                remove = FALSE)  
})
```
```{r}
g58 <- g58 %>% 
  mutate(occupation =case_when(
                            str_detect(occupation, "Mng") ~ "Manager",
                            str_detect(occupation, "Pro") ~ "Professionals",
                            str_detect(occupation, "TTW") ~ "Technicians_and_trades_workers",
                            str_detect(occupation, "TechnicTrades_Wrs") ~ "Technicians_and_trades_workers",
                            str_detect(occupation, "CPS") ~ "Community_and_personal_service_workers",
                            str_detect(occupation, "CA") ~ "Clerical_and_administrative_workers",
                            str_detect(occupation, "Sal") ~ "Sales_workers",
                            str_detect(occupation, "MOD") ~ "Machinery_operators_and_drivers",
                            str_detect(occupation, "ID_NS") ~ "Not Stated",
                            TRUE ~ occupation))  %>%
  select(-category) %>%
  rename(count_occupation = count)
```

Row{data-width=420}
-----------------------------------------------------------------------
### Chart A

- It can be observed from both figures that overall females worked more than men. However, as the number of work-hours increased men have worked more than women.

```{r, include=FALSE}
p1 <- g52 %>% 
  mutate(hr_min = as.numeric(hr_min)) %>% 
  summarise(hr_min = sum(hr_min, na.rm = TRUE))
p2 <- g52 %>% 
  mutate(hr_max = as.numeric(hr_max)) %>% 
  summarise(hr_max = sum(hr_max, na.rm = TRUE))
```

```{r hr_plots, fig.show='hold', out.width="50%"}
p1 <- g52 %>%  
  ggplot(g52, 
           mapping = aes(x = hr_min,
                         y = count_industry,
                         fill = sex)) +
           geom_bar(stat = "identity",
                            position = "dodge") +
           theme_bw() +
           xlab("Minimum Hours") +
           ylab("Count") +
           ggtitle("Min hours worked for Industries")
p1

p2 <- g52 %>%   
  ggplot(g52, 
           mapping = aes(x = hr_max,
                         y = count_industry,
                         fill = sex)) +
           geom_bar(stat = "identity",
                            position = "dodge") +
           theme_bw() +
            xlab("Maximum Hours") +
            ylab("Count") +
           ggtitle("Max hours worked for Industries")
p2
```

Row{data-height=200}
-----------------------------------------------------------------------
### Chart B

- It can be observed from figure that industries like health care, education and training, construction and Professional and technical services have more working population as the working hours increased. Mining, electricity, gas, water showed low working population irrespective of work hours.
```{r ind_hrs}
g52redundanthrs <-  g52[rep(rownames(g52), g52$count_industry), ]


hrindcount <- g52redundanthrs %>%
  ggplot(mapping = aes(x = hr_min, y = industry)) +
  geom_count() +
  labs(title = "Population: Industries and hours", x = "Hours") +
  theme(axis.title.y = element_blank())

ggplotly(hrindcount)
```


Adarsh(G58 Analysis) {data-icon="G52 and G58"}
=============================
Row{data-width=400}
-----------------------------------------------------------------------
### Chart C

- It can be observed from figure that overall females worked more than men at all occupations. Although, for maximum hours worked, as number of working-hours increased, the number of men and women remained the same.

```{r, include=FALSE}
p3 <- g58 %>% 
  mutate(hrs_min = as.numeric(hrs_min)) %>% 
  summarise(hrs_min = sum(hrs_min, na.rm = TRUE))
p4 <- g58 %>% 
  mutate(hrs_max = as.numeric(hrs_max)) %>% 
  summarise(hrs_max = sum(hrs_max, na.rm = TRUE))
```

```{r hrs_plots, fig.show='hold', out.width="50%"}
p3 <- g58 %>% 
  ggplot(g58, 
           mapping = aes(x = hrs_min,
                         y = count_occupation,
                         fill = sex)) +
           geom_bar(stat = "identity",
                            position = "dodge") +
           theme_bw() +
           xlab("Minimum Hours") +
           ylab("Count") +
           ggtitle("Min hours worked at Occupation")
          
p3

p4 <- g58 %>% 
  ggplot(g58, 
           mapping = aes(x = hrs_max,
                         y = count_occupation,
                         fill = sex)) +
           geom_bar(stat = "identity",
                            position = "dodge") +
           theme_bw() +
           xlab("Maximum Hours") +
           ylab("Count") +
           ggtitle("Max hours worked at Occupation")
p4
```

Row{data-height=250}
-----------------------------------------------------------------------
### Chart D

- It can be observed from figure that the most number of employees in the SA4 regions are employed in the occupations of Professionals, Managers and Technicians and trade workers. Professionals accounted for highest number of employees for region 206, while machinery operators and drivers accounted for the least number of employees for region 213 respectively.

```{r popareaoccupation, fig.cap=""}

popareaoccupation <- g58 %>%
  group_by(SA4_CODE_2016, occupation) %>%
  summarise(count_occupationarea = sum(count_occupation)) %>%
  ungroup()

popareaoccupationmax <- popareaoccupation %>% 
  select(1:3) %>%
  group_by(SA4_CODE_2016) %>%
  slice_max(count_occupationarea) %>%
  arrange(SA4_CODE_2016)
popareaoccupationmax %>% 
  full_join(sa4_geomap, 
            by = c("SA4_CODE_2016"="SA4_CODE")) %>%
  ggplot() +
  geom_sf(mapping = aes(geometry= geometry, fill=occupation)) +
  geom_sf_text(aes(geometry= geometry,label=occupation, colour="white"), check_overlap=TRUE)+
  theme_void() +
  theme(legend.position = "bottom")
bestfield <- popareaoccupation %>% 
  select(1:3) %>%
  group_by(occupation) %>%
  slice_max(count_occupationarea) %>%
  arrange(SA4_CODE_2016)
	
bestfield %>%
  ggplot() +
  geom_col(mapping = aes(x = reorder_within(occupation,count_occupationarea, SA4_CODE_2016), y = count_occupationarea, fill = occupation)) +
  labs(title = "Region and Best Occupation", y = "Number of Employees") +
  scale_y_continuous(label=label_number()) +
  coord_flip()  +
  theme(legend.position = "none")
```